Conditional autoencoder asset pricing models for the Korean stock market

This study analyzes the explanatory power of the latent factor conditional asset pricing model for the Korean stock market using an autoencoder. The autoencoder is a type of neural network in machine learning that can extract latent factors. Specifically, we apply the conditional autoencoder (CA) model that estimates factor exposure as a flexible nonlinear function of covariates. Our main findings are as follows. The CA model showed excellent explanatory power not only in the entire sample but also in several subsamples in the Korean market. Also, because of this explanatory power, it can better explain market anomalies compared to the traditional asset pricing models. As a result of examining investment strategies using pricing error, the CA model measures the expected return of stocks better than the traditional asset pricing model. In addition, the CA model indicates that the firm characteristic variables are important in asset pricing conditional on macro-financial states, such as the global financial crisis and the coronavirus disease 2019 pandemic. The result shows that the major variables considered in the explanation of stock returns through the CA model may vary depending on the time. This is expected to provide a broader perspective on asset pricing through the CA model in the future.


Introduction
Various risk factors have been introduced in the extant literature to explain cross-sectional stock returns or market anomalies that traditional asset pricing models fail to elucidate [1][2][3]. These efforts have led to the inclusion of numerous factors, and this phenomenon of factor overflow is called factor zoo [4]. Indeed, scholars have endeavored to identify factors that provide information orthogonal to other existing factors, considering the multi-dimensionality of numerous factors. Thus, distinguishing between significant and unnecessary pricing factors becomes essential for identifying the true factors.
Alternatively, rather than specifying factors in advance based on the empirically observed cross-sectional characteristics, researchers have attempted to discover potential factors that can best explain the stock returns through ex-post and bottom-up approaches. A popular idea is to estimate latent factors by extracting information from high-dimensional data using machine learning. Such attempts to use latent factors for asset pricing began with Ross' a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Literature review
Our paper also contributes to the growing field of machine learning in finance. Recent literature uses machine learning in finance including equity return forecasting, asset pricing, and risk management. For example, distinguished by algorithms, researchers tested other approaches with shrinkage methods [12,13] the class of Support Vector Machines [13][14][15][16], as well as tree-based methods [17,18] such as the Gradient Boosting Machine or the Random Forest. Furthermore, many papers applied various architectures of neural networks to predict future asset prices [19][20][21]. Other, less widespread methodologies include natural language processing [22] Principal component analysis [23], autoencoders [24], and Reinforcement learning [21,25]. The use of artificial intelligence continues in the study of asset price models. Most applications, in line with the traditional asset pricing literature, consider only linear relationships between financial variables and subsequent stock returns. For example, the Capital Asset Pricing Model (CAPM) introduced by Sharpe [26], Lintner [27], and Mossin [28] posits that, in equilibrium, a stock's expected return is solely driven by its sensitivity to a systematic risk factor, i.e., the market risk. An assumption is that the underlying pricing kernel is linear in only a single factor, i.e., the market portfolio.
Various studies, however, report violations of this assumption (e.g., Hou et al. [3], for a comprehensive list of asset pricing anomalies) and examine the alternative asset pricing models. Following Dittmar [29], we classify them into two subcategories. The first subcategory utilizes other pricing factors in addition to the market portfolio. Most prominently, Fama and French [30] propose a multifactor alternative to the CAPM and find that it is better at explaining cross-sectional variation in expected returns than the CAPM. Other examples include Ross' asset pricing theory (APT) [5] and Merton's intertemporal CAPM (ICAPM) [31]. The second subcategory abandons the restriction that the pricing kernel must be linear in pricing factors. Bansal et al. [32], Bansal and Viswanathan [33], Chapman [34], Dittmar [29], and Asgharian and Karlsson [35], among others, explore various nonlinear pricing kernels and show that such specifications outperform linear counterparts.
While the first subcategory of models motivates the use of multiple pricing factors, the second subcategory suggests that using interactions between these factors and incorporating nonlinear relationships between price-related variables and expected stock returns add incremental explanatory power. For this, many studies use machine learning methods. Messmer [36] and Feng et al. [37] predicted stock returns with neural networks. Bianchi et al. [38] used a machine learning method to predict bond yields and compared them with the existing traditional methods. Freyberger et al. [39] estimated the risk premium of stock returns through a non-linear additive function using the Lasso selection method. Feng et al. [37] impose a noarbitrage limit using a predefined set of linear asset pricing factors and measure the loading of each of the above factors through a deep neural network. Rossi [40] derived a conditional mean-variance efficient portfolio based on the market portfolio and risk-free assets through Boosted Regression Trees. More recently, new methods have been developed to extract statistical asset pricing factors from large panels with various derivatives of principal component analysis (PCA). The Risk-Premium PCA, suggested by Lettau and Pelger [23], introduced a pricing error penalty to detect weak factors which explain the cross-sectional variance of returns. The high-frequency PCA, suggested by Pelger [41], utilized high frequency data to estimate local time varying latent factors.
Enhancing PCA models by including only static observable factors, Kelly et al. [9] proposes an instrumented PCA (IPCA) asset pricing model, which includes unobservable latent factors that change over time. The IPCA model can explain the stock return better than the PCA models or the FF models for the U.S. stock markets; IPCA renders the pricing errors (alphas) of many firm characteristic-managed portfolios insignificant. Inspired by the IPCA model, Gu et al. [10] proposed a conditional autoencoder (CA) asset pricing model to reflect the influence of external variables. The autoencoder, one of the machine learning methods, is a dimension reduction technique that can infer nonlinear relationships among data., that is, it is a generalized PCA with nonlinearity [11]. Gu et al. [10] find that the CA model exhibits a better explanatory power for stock returns than the FF and IPCA models for the U.S. market. These results indicate that the latent factors estimated through machine learning techniques can be effectively employed in asset pricing models.
The most recent study, Gu et al. [10], extended the linear conditional factor model of Kelly et al. [9] to a non-linear factor model using an autoencoder neural network. In this study, the model of Gu et al. [10] is replicated in the Korean market, and the explanatory power of the asset price model constructed using the machine learning method is checked whether it can be generalized.

Data and methodology
This chapter describes the scheme and the evaluation of the CA model. The data used for the CA model, the structure of the CA model, and the evaluation are elaborated. Comparative models are also described. S1 Fig in S1 Appendix is a schematic diagram of the CA model learning process. The methodology used in this study consists of four steps.
It is data collection in first step. Market price and fundamental data of KOSPI and KOS-DAQ stocks are collected in the Korean market. The second step is the data preprocessing step for analysis. Merge each data and create a firm characteristics variable to use in analysis. It divides learning, verification, and test data necessary for AI model learning, and separates samples for subsample test. Step 3 is model training. With the training data generated in step 2, the CA model described in Section 3.2 is trained. The final model is created through hyperparameter tuning through the validation data.
Step 4 is evaluation. In this step, the main findings of the study are drawn. Analyze model explanatory power, portfolio alpha test, APT strategy performance comparison, and importance of company-specific variables. In this chapter, the main parts of this process are explained in detail.
To avoid a forward-looking bias, following Gu et al. [10], we match realized returns at month t with the most recent monthly variables at the end of month t−1 and the most recent annual variables as of t−6. To eliminate the influence of outliers and facilitate model learning, we normalize all characteristics into the interval (−1, 1) for each month t as in Eq (1).  (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14) and second categories  comprise monthly variables and annual variables, respectively. The studies that suggest the corresponding firm characteristic variables covered in this research are included in parentheses.

No. Acronym Definition of the characteristic
No. Acronym Definition of the characteristic 1 beta Estimated market beta from weekly returns and market returns for 3 years ending month t−1 with at least 52 weeks of returns [40]. Specifically, we calculate the monthly rank based on the cross-section of each firm characteristic variable. Subsequently, we divide this rank by the total number of shares, multiply by two, and subtract one. In this case, we standardize each variable with the maximum value of 1 and the minimum value of −1 (see the normalizing equation below).
Specifically, we calculate the monthly rank based on the cross-section of each firm characteristic variable. Subsequently, we divide this rank by the total number of shares, multiply by two, and subtract one. In this case, we standardize each variable with the maximum value of 1 and the minimum value of −1 (see the normalizing equation below). The reason for this standardization is that outliers can excessively influence the machine learning model [70].

Conditional autoencoder
The CA structure is based on the arbitrage pricing theory. The static linear factor model used in many studies can be described as in Eq (2) [7,71].
where r t represents the return vector that exceeds the risk-free interest rate, f t is the vector representing the factor returns of K × 1, and e t is a vector indicating idiosyncratic errors of N × 1. Further, β is a vector indicating a factor loading of N × K, where N is the number of items and K is the number of factors. This is the same form as the general (standard) factor model used in empirical finance. For example, the Fama and French [30,72] three-factor (FF3) model uses observable factors in the financial market, such as market return, SMB, and HML. By contrast, Bai and Ng [73] and Stock and Watson [74] examine the latent factor through dimensionality reduction in the return covariance matrix using methods such as PCA.
The PCA model can be used to infer latent factors from returns and obtain dynamically changing coefficients of latent factors. However, the PCA method is an unsupervised learning technique and does not reflect external information, leading to constant betas over time and states. To solve this problem, Kelly et al. [9] extend PCA and propose an IPCA model that infers latent factors and dynamically changing betas through external variables, as in Eq (3). Applying this model, the beta varies dynamically by firm characteristics and market environment.
We use the autoencoder, a generalized version of PCA, to guide dimension reduction. Autoencoder used in this study is one of the deep neural network models and has been introduced in many studies related to dimensionality reduction [75][76][77]. The main principle of the autoencoder is taken from its name. "Automatic" means that the method is unsupervised learning, and "encoder" means learning different representations of the data. In particular, the autoencoder learns the encoded representation by minimizing the loss between the original data and the decoded data. Therefore, an autoencoder is a neural network that encodes input data into a low-dimensional representation and then decodes it again to train it to map the output data itself. The lower-dimensional representation allows the autoencoder to capture the greatest features of the data. Due to these characteristics, autoencoders can be considered nonlinear generalizations of PCA [78].
Particularly, we apply the CA to infer the factors (f t ) and factor loadings (β i,t ). As illustrated in Eq (3), the CA model allows dynamic factor loading, β i,t , in the latent factor through the autoencoder, f t . Further, the model has a nonlinear beta that changes with the firm characteristic variable. z i,t−1 represents a company characteristic variable, and matrix Γ defines the mapping between many characteristics and a small number of latent factors. The mapping is described in detail in the figure. Fig 1 describes the CA architecture. The left-hand side depicts how the β is deduced. In model learning, we use stock returns and firm characteristic variables for N stocks over time T. As the firm characteristic variables for each stock pass through the hidden layers, they become compressed to K dimensions. Note that Fig 1 only illustrates the learning structure at a specific time t.
The right-hand side demonstrates the process by which the latent factors are channeled through individual stock returns. The return data of N stocks are then compressed into K dimensions on the latent factors. To accomplish this, following the method of Gu et al. [10], N number of individual stocks constitute P number of long-short portfolios based on the values of each P firm characteristic variable. This indicates that the data dimension is reduced from N to P by constructing long-short portfolios (P) using firm characteristic variables. As the number of network nodes in the autoencoder model reduces significantly (because of using P rather than N variables), model learning is facilitated. Additionally, when learning from portfolio returns, the autoencoder mitigates noises generated by individual stock returns. The result, calculated on the left-hand and right-hand sides, is finally dot-produced as (N × K) × (K × 1) to create N × 1 returns.
The CA model has certain advantages compared to the standard autoencoder model illustrated in S1 Fig in S1 Appendix. As the standard autoencoder model has output and input layers identical to the number of stocks, it cannot reflect external market information in beta that can change over time. However, our model depicted in Fig 1 can reflect the external information on the left-hand side of the beta part, and consequently, we can estimate the dynamic betas.

Model learning
We divide the sample data into three disjoint periods, "train," "validation," and "test." Considering the characteristics of time series data, instead of shuffling, we separate the data while maintaining the time order. Specifically, the "train" subsample comprises data for estimating the model according to a specific set of tuning hyperparameter values. The "validation" subsample is used to tune the hyperparameters of the model. Finally, we apply the "test" subsample, which has never been used for "train" or "validation," to evaluate the method's OOS performance.
Considering 30 years of the sample period, from January 1991 to December 2020, we set the first 13 years of data as the "train" subsample (1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003), the next 2 years as the "validation" subsample (2004)(2005), and the remaining 15 years as the "test" subsample (2006-2020). Next, in the machine learning process, we increase the train data by one year over time, and continuously increase and retrain the first train data. Each time we refit the model once a year, we increase the "train" subsample by one year. We maintain the sample size of the "validation" subsample by rolling it forward to include the most recent data.  Table 2 displays the hyperparameters used in our CA model learning process. To prevent overfitting, we apply various regularizations during the network training. First, we apply the L1 regularization to the objective function of the neural network model, as in Eq (4).

Minimize
1 NT where ϕ(θ) refers to the L1 (norm) penalty term, which sets the Laplace distribution to a prior distribution and makes the weights sparse. r i refers to the actual return of stock i, and The CA model uses Adam optima [79] through the above objective function. Adam optima utilizes low-dimensional moments as one of the stochastic optimization methods and is known to work adequately under non-stationary, noisy, and sparse gradients. Second, we apply a batch norm that can reduce the effect of layer initialization and solve the covariate shift problem, enabling smoother learning [80,81]. It reduces the training time and improves the generalization of the model by preventing overfitting.
Third, we apply early stopping-an algorithm that terminates learning when the error of validation data increases after a certain period while training a model with the "train" subsample-using validation data to prevent excessive overfitting.
Finally, we employ the ensemble method, which creates and trains five same-structure models with different random seeds. Given the nature of the neural network, the performance varies depending on the random seed. The ensemble method mitigates the problem and makes the results robust. We obtain the final output of the model by averaging the outputs of the five models.

Model comparison
We compare the performance of the latent factor models as follows. First, we vary the CA model according to the specifications of hidden layers. The CA0 model has no hidden layer. The CA1 model adds a hidden layer of 32 neurons, while the CA2 model adds a hidden layer of 16 neurons to CA1. Further, the CA3 model adds a hidden layer of eight neurons to CA2.
We analyze the FF asset pricing models that are based on the observable factors to enable comparison with the above-mentioned CA models. The explanatory power of the traditional asset and that of the AI-based CA model are compared. The reason to compare these explanatory powers is that the number of factors in the CA model or in the traditional asset pricing model can be naturally controlled. Table 3 represents the composition of the observable factors in the FF models according to the number K, which we set from one to six to match the number of latent factors created by  [2]. Additionally, we add UMD factors to the FF5 factors in six-factor model, which helps compare CA models to the asset pricing models using latent factors or observable factors in correspondence.

Asset pricing performance
In this section, we compare the performances of the CA and traditional asset pricing models. We evaluate the OOS performance using the data from January 2006 to December 2020. To verify the explanatory power of the model, we examine both individual stock returns and test portfolios created through firm characteristic variables. For the test portfolio, we construct a portfolio return through a "bottom-up" approach, following Gu et al. [24] and Feng et al. [83], as in Eq (5).
The portfolio return (R p ) is measured by the weighted sum of r i,t of N individual stocks. We construct a set of 5 × 5 portfolios by crossing two firm characteristics. Each firm characteristic produces five groups of firms. In the bivariate comparisons, we always include firm size. As our analysis covers 38 firm characteristic variables, we generate a total of 950 portfolios as test assets. We use the total R 2 to evaluate the performance, as in Eq (6).
The total R 2 (R 2 total ) indicates the extent to which the latent factors derived by the model explain the stock return, r i refers to the actual return, and beta*factor (b 0 i;tÀ 1f t ) denotes the return estimated by the CA model. In the portfolios, r i reflects the portfolio return; we can calculateb 0 i;tÀ 1f t similarly.  Table 4 presents the OOS total R 2 (%) for individual stocks using the observable factor models (FF) and the CA models (CA0 to CA3). In all cases, the number of factors varies from K = 1 to K = 6. Additionally, the total R 2 from our CA model generally dominates that from the FF models, which implies that the CA model has more explanatory power. Further, it illustrates that the explanatory power of the CA models increases as the number of factors, K, increases. Interestingly, the total R 2 of the CA model becomes larger as the number of factors increases, while that of the FF models is not significantly affected by the number of factors. The results imply that, in the case of the FF models, added factors are mitigated by other existing factors. Conversely, in the case of the CA model, the model performance improves linearly with the inclusion of additional factors, suggesting that the model effectively extracts the latent factor information.
We examine the consistency of these results not only in individual stocks but also at the portfolio level in Table 5. The 5 × 5 portfolio on firm characteristic variables comprises two portfolio types-value-weight (VW) and equal-weight (EW) portfolios. The total R 2 trend in Table 5 is similar to that of individual stocks in Table 4.
First, in the case of EW, the table demonstrates that the explanatory power of the CA model is superior to that of the FF models. Second, the explanatory power of the CA model tends to increase as K increases, regardless of EW and VW portfolio types. In the case of the FF models, VW portfolios display mostly high explanatory power when K exceeds three. Further, the Table 4. Out-of-sample total R 2 for individual stocks. This table reports the out-of-sample total R 2 (%) for individual stocks using observable Fama-French (FF) factor models and conditional autoencoder (CA) models (CA0 through CA3). It presents the results of applying latent factors from 1 to 6 to each model. explanatory power increases significantly if K increases from one to two when the FF model adds the SMB factor representing the size effect. This is natural because the VW portfolio represents market-cap-weighted portfolios. Nevertheless, when K increases further, the increase in the explanatory power is relatively low. Furthermore, we examine how the model explanatory power changes at each time point. Fig 3 compares the explanatory power of the CA models with that of the FF3 and FF5 models. For comparison, we consider the CA model with K = 3 and K = 5. In this analysis, R 2 is the 12-month moving average value using monthly data. In Fig 3, the explanatory power of each model moves differently depending on the time point; however, the CA model is always superior to the FF models. These results confirm that the explanatory power of the CA models is superior regardless of timing or macro situations.

Subsample analysis
Here, we examine the explanatory power of various sub-samples. In this study, we test whether the CA model can provide generalized explanatory power regardless of subsample composition. To this end, in this study, five subsamples of industry classification, market classification, firm size, penny stock, and market inefficient stock classification are examined. To simplify, we set K = 5, which implies that the CA and FF5 models have the same number of factors for comparison.
We tested the explanatory power of the models on subsamples in 4 perspectives. First, we compared the difference of the explanatory powers according to the KOSPI market and the KOSDAQ market because the Korean stock market is largely divided into The KOSPI market and the KOSDAQ market. Second, we examined the difference of the explanatory powers regarding the presence of the penny stocks. Third, we studied whether the investors irrationality affects the explanatory power or not. Finally, we divided the samples randomly, and tested whether the explanatory power of the CA model is robust. The market anomaly can be more pronounced for KOSDAQ-listed stocks. The traditional asset pricing model is known to have low explanatory power for KOSDAQ-listed stocks [84], which is akin to the findings for other international stock markets. This means that the existing asset price model lacks explanatory power in the KOSDAQ market. On the other hand, we test whether the CA model exhibits superior explanatory power in both the KOSPI and KOSDAQ markets. We examine the difference in the explanatory power between the KOSPI and KOSDAQ markets. Table 6 reports the total R 2 for individual stocks using the OOS model. The results from Table 6 confirm that the explanatory power for KOSDAQ-listed stocks is relatively low compared to KOSPI-listed stocks in both the FF and CA models.
In the case of the FF models, the total R 2 of KOSDAQ-listed stocks (2.885) are 63% lower compared to those of KOSPI-listed stocks (7.869). However, in the case of the CA models, the difference is less dramatic-only a 20% decrease from KOSPI-listed stocks (17%) to KOSDAQ-listed stocks (14%). Overall, while the explanatory power of the CA models decreases in the KOSDAQ market, the CA model still dominates the FF models and exhibits stable performances. The results of this study are consistent with the existing Han et al.
(2020) studies, as traditional asset price models show low explanatory power in the KOS-DAQ market. When analyzing the KOSDAQ market through the existing asset price model, the explanatory power itself is low, so it can be a factor that has many limitations in using the model. However, the CA model has superior explanatory power compared to the existing asset price model, and it implies that the difference in explanatory power between the KOSPI market and the KOSDAQ market is not large. This means that the CA model can provide excellent explanatory power in both markets with different market characteristics. Also, similar results were found in subsamples according to the industry or the company market value. The CA model has excellent explanatory power in detail industries subsamples. In addition, the explanatory power was examined by subsamples according to the company market value and this result is also good regardless of the market value. The result tables describe in detail the explanatory power according to the subsamples, located S1-S3 Tables in S1 Appendix.
Penny stocks tend to have high returns, systematic risk, and unsystematic risk. Therefore, the traditional asset pricing models have low explanatory power over penny stocks. We define penny stocks as those with a closing price of 5,000 won or less in the Korean market, following Kim and Kang [85]. Table 7 illustrates the results with the entire sample, the sample excluding penny stocks, and penny stocks. The FF models indicate that the explanatory power increases when penny stocks are excluded from the sample. However, the CA models suggest no statistically significant differences in terms of penny stocks.
As in the results of this study, it can be seen that the explanatory power of penny stock is very insufficient in the traditional asset price model. However, the CA models suggest no statistically significant differences in terms of penny stocks. This means that in the CA model, there is no difference between the explanatory power of penny stock and the explanatory power of other samples. This indicates that, unlike the FF models, the CA model shows excellent explanatory power even in penny stocks. That is, we found that the CA model can provide adequate explanatory power for stock returns with or without penny stocks.
The transaction cost and the irrationality of investors significantly reduce the explanatory power of the asset pricing models for the Korean markets [86]. Therefore, we examine the impact of transaction cost and investors' irrationality on total R 2 . As a proxy for transaction cost, we use Roll's spread [87] as in Eq (7), where cov represents the auto-covariance of stock returns, using the daily return from t−12 to t−1 months. We convert all positive auto-covariances to negative numbers when calculating roll spread, following Roll [87] and Lesmond [88].
roll spread i ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi À cov i p : Another estimate of transaction cost is based on the limited dependent variable model, following Lesmond et al. [88]. It includes spread effects, price impact effects, and market depth effects and can be expressed as Eq (8).
where α 1 (i) < 0 is the sell-side transaction cost of stock i, α 2 (i) > 0 is the buy-side transaction cost, and R(i, t) is the observed actual rate of return. Further, R*(i, t) is the rate of return in the market without unobserved friction according to the market model regression. Specifically, the observed return is generated by the behaviors of investors considering transaction costs. Additionally, the model assumes that the investor will act (buy, sell) when the expected profit (loss) exceeds the transaction cost. Assuming R*(i, t) follows a normal distribution, This table reports the out-of-sample total R 2 (%) for individual stocks using observable Fama-French (FF) factor models and conditional autoencoder (CA) models (CA0 through CA3). Along with the entire sample, we illustrate the results with and without penny stocks. The t-value represents the test value of the monthly total R 2 difference from that of the entire sample. *, **, and *** denote the rejection of the null hypothesis of the absence of causality at the 10%, 5%, and 1% levels, respectively. https://doi.org/10.1371/journal.pone.0281783.t007 the log-likelihood of the above expression is given by Eq (9).
where R 0 , R 1 , and R 2 correspond to the case where the observed rates of return are zero, negative, and positive, respectively. σ 2 is the variance estimated using the observed actual returns, and F denotes the cumulative distribution function of the standard normal distribution. Estimates of α 1 and α 2 can be obtained by maximizing the above log-likelihood function. α 2 (i) − α 1 (i) are estimates of the round-trip transaction cost of competitive and marginal investors. Following Lesmond et al. [89], we use α 2 (i) − α 1 (i) as the transaction cost. Finally, we calculate retail composition as a proxy for investors' irrationality [90,91]. Individual investors tend to exhibit more behavioral bias in their trading than other investors largely because of overconfidence and disposition effects [90][91][92][93]. We calculate retail composition as the trading volume of individual investors relative to the total trading volume, as expressed in Eq (10).

Retail Composition i;t ¼ Individual Trading Volume i;t Total Trading Volume i;t : ð10Þ
Subsequently, we divide our entire sample into two groups using each proxy-high and low -based on the upper and lower 30% in the cross-section. Through the method proposed by Racicot [94], it was shown that the traditional FF lacks the explanatory power of the flow factor, and the explanatory power of the illiquidity can be significant through GMM estimation [95,96]. For the robustness of the analysis, a liquidity factor [97] is added to the traditional asset price model. We compare the model in which the liquidity factor (LIQ) is added to the asset price model in Table 3 where K is 4 and the model where K is 5. The liquidity factor is a constructed variable. LIQ factor is the average of the stocks γ i,t from regression Eq (11).
where r i,d,t is the return of stock i on day d in month t and v i,d,t is the dollar trading volume of stock i on day d in month t. ε i,d,t is the residual of stock i on day d in month t. Table 8 demonstrates that the explanatory power of stocks with large investor irrationality and trading restrictions is statistically significantly lower than that of the others. Regarding the magnitude of the reduced explanatory power, FF5 displays negative explanatory power in the subsamples with large transaction costs and investor irrationality. This implies that most of the explanatory power of FF5 disappears. Adding the liquidity factor increases the explanatory power of highly liquid stocks, but still lowers the explanatory power of illiquid stocks. As argued by Racicot and Rentz [95], the LIQ factor in the FF model supports that the explanatory power is not large. On the other hand, the CA model exhibits an explanatory power of 8-10%. This indicates that the explanatory power of the traditional asset price model is still inferior to that of CA even when the LIQ factor is considered. In S4 Table in S1 Appendix, various cases are examined, such as adding the LIQ factor to FF5F according to the method of Racicot and Rentz [95]. In all cases, the main results are consistent.
In conclusion, despite the factors that lower the explanatory power of the asset pricing models for the Korean market, the explanatory power of the CA models remains considerably robust compared to that of the FF models. This means that the time-varying model is an effective systematic risk measure [96], and the CA model fits this purpose well and shows excellent explanatory power even for stocks with low illiquidity.
This result confirms that the performance of the asset price model deteriorates as the investor irrationality and transaction limiting factors increase in the Korean market, which is consistent with Chae and Yang's [86] findings. On the other hand, the CA model shows excellent explanatory power in all samples. This shows that the CA model overcomes the limitations that the existing asset price model failed in the Korean market. This shows that the CA model can be a model for explaining the stock returns in the Korean market. factor models and conditional autoencoder (CA) models (CA0 through CA3). Panels A, B, and C represent the OOS total R 2 of the samples classified according to Roll's spread, Lesmond transaction cost, and retail composition size, respectively. The t-value represents the test value of the monthly total R 2 difference. *, **, and *** denote the rejection of the null hypothesis of the absence of causality at the 10%, 5%, and 1% levels, respectively. https://doi.org/10.1371/journal.pone.0281783.t008 To check the robustness, following Gu et al. [10], we retrain and refit the CA model using subsamples of stocks composed of odd and even tickers, respectively. Each odd and even ticker is composed of 1,612 subsamples, and these tickers do not mutually overlap in any sample. To check the robustness, we test the odd and even samples in the OOS after training the model using only the odd sample. Conversely, we test the odd and even samples in the OOS after training the model using only even samples. This verification method is advantageous in that it enables checking the robustness of the model despite omission or arbitrary deformation of the sample. Table 9 illustrates the results.
The model trained with even samples exhibits 11.24% of the total R 2 for even samples and 11.62% for odd samples in the OOS. Further, the model trained with odd samples displays 11.94% of the total R 2 for even samples, and 10.87% for odd samples in OOS. Overall, the total R 2 is similar in both methods. This indicates that the explanatory power of the CA model is stable even when some samples are omitted in the model training process.

Portfolio alpha test
Now, we directly examine whether the CA model can explain the market anomalies using portfolio alpha tests. Specifically, following Gu et al. [10], we test whether the average of the residuals for each long-short portfolio created based on 38 firm characteristics is statistically different from zero.
To construct an estimate of the pricing error from the OOS data, we compute the mean difference between the actual return and the model estimation. The difference between the two values can be interpreted as the alpha (pricing error, α) of each portfolio, as in Eq (12).
We analyze the existence of alpha using the CA, FF3, and FF5 models. For comparison, we employ an identical number of factors used in each comparative model. For example, we compare the FF3 model with the CA model with K = 3, and the FF5 model with the CA model with K = 5.
After composing the deciles portfolio according to the magnitude of the firm characteristic variables, we measure the alpha of the 10-1 long-short portfolio. We construct both VW and EW portfolios and judge the significance of alpha based on the t-values of 1.96 and 2.58 according to the 95% and 99% significance levels, respectively.  Table 9. Robustness test. This table reports the out-of-sample (OOS) total R 2 (%) for the subsamples of stocks that have odd and even permanent numbers, based on parameters estimated separately. The rows present the subsample {Even, Odd} for which we estimate the parameters, whereas the columns represent the subsample {Even, Odd} for which we evaluate the OOS performance. All estimates are based on the five-factor CA1 model. CA: conditional autoencoder.

Total R 2 (%) Test
Even Odd  Table 10 represents the full results. Compared to the traditional asset pricing model, when measuring alpha through the CA model, it can be seen that the number is reduced. Also, in the case where the cutoff is 2.58, where the alpha measurement is strict, the number of CAs tends to decrease as K increases. This is a trend that appears in both EW and VW. Comprehensively, the CA models have superior power in explaining portfolio alphas or market anomalies compared to the traditional asset pricing models.
Intuitively, the presence of alphas explained by the CA models implies that omitted (unobserved) risk factors beyond the FF factors possibly generate excess returns. This indicates that the CA model effectively reflects factors that explain stock returns in addition to the factors dealt with in the traditional asset price model. Therefore, it can be said that the conditional autoencoder structure used in this study has a great advantage in deriving the latent factor from market data. These pricing errors can be interpreted as the average gain of a long-short portfolio that has zero exposure for any systematic factors. That means, the anomaly rate of return due to company characteristic variables can be explained by risk factors as claimed by Gu et al. [10], and it demonstrates that the CA model can explain these returns well.

APT strategy
To examine whether the CA model is a better fit for the Korean stock market compared to the traditional asset pricing model, we compare the long-short profits based on the expected return calculated by the traditional asset price model and our latent factor model, the CA model, respectively. Fig 5 shows undervalued or overvalued stocks for the CAPM model. The expected return of stocks lying on the SML line satisfies the CAPM model, which has the intrinsic value. The undervalued (overvalued) stock is plotted above (below) the SML line, whose actual Ri is larger Similarly, the multi-factor asset pricing model is expressed as Eq (13). In this chapter, we compare the expected return calculated by the traditional asset price model and our latent factor model, the CA model, respectively.
Undervalued stock : E r i;t À � À r i;t > cut off value The pricing error is calculated as the difference between the expected rate of return and the actual rate of return. To eliminate the trivial error value, we set the reference level of the average pricing error at 1% using the past 1-month window. For example, the undervalued (overvalued) stock is classified when the average value of the difference between the expected return and the actual return for the last month is larger than 1% (-1%).
Then, we construct a long-short portfolio with undervalued stocks and overvalued stocks. Traditional asset pricing models to calculate the expected return includes CAPM, FF3F, FF4F and FF5F (in Table 11), and the CA models includes CA1, CA3, CA4 and CA5 (in Table 12).
The results from Table 11 shows the results from the strategy performance in the traditional asset pricing model. The Overvalued, Neutral, Undervalued, and LS denotes the rate of return for overvalued stocks, undervalued stocks, stocks lying on SML, and Long-short portfolio, respectively. The results show that the Long-Short strategy profit is not significant for all strategies using the pricing error of each model, including CAPM and FF5F. Even the rate of return from undervalued stocks is not statistically significant. Considering that the definition of High is the stocks whose price would decrease and its expected return would increase until it is plotted exactly on the line, our results indicate that the traditional asset price model is not valid for investment in the Korean market. These results are consistent with Kim and Kim [98] and Kang and Jang [99].
The results from Table 12 show the long-short profit based on undervalued and overvalued stocks in the CA model. The results show that the rate of return from undervalued stocks is statistically significant, unlike the results using the traditional asset pricing model. Also, the results from the Long-Short strategy are statistically significant except for the CA1 model, which has a latent factor of 1. In addition, our results confirm that the latent factor increases, the performance of the LS strategy and the significance level increase. These imply that the overall explanatory power of the model increases as the number of latent factors in the CA model increases.
Since the performance of the strategy reviewed in this study can vary depending on the cutoff value and lookback for pricing error, we examine this. First, the cutoff value for pricing error is set to 3%, 5%, and 10% instead of the existing 1% to examine the trend of changing performance. Table 13 shows portfolio trends examined by diversifying the cut off value for pricing error. Similar to Table 13, it can be seen that the LS yield increases from CA1 to CA5, and the significance level increases. In particular, it shows that the higher the cut off value, the higher the rate of return of the LS strategy. This means that when a strategy is taken based on stocks with large pricing errors, the effect increases.

Importance rankings of firm characteristics
Here, we identify the top 10 most important firm characteristics ex-post. To simplify, we set K = 5 in the CA model. We rank the importance of the characteristics by estimating the reduction in the total R 2 resulting from setting the values of a given characteristic to zero while holding the remaining estimates fixed. We can estimate the relative importance by standardizing the value of the reduction in the total R 2 of each variable to [0, 1]. Subsequently, we rank the standardized value to check the importance order of the variables in the CA model. This case has a problem in that the importance of another variable may converge to zero when the extent of change in the R 2 of a specific variable is large. Fig 5 displays the top 10 most important variables in each CA model (K = 5) based on the average of the entire sample. Overall, market equity (mvel1), total return volatility (retvol), sales to price ratio (SP), and market beta (beta) are commonly selected as important variables for the model.
Additionally, Fig 6 depicts the importance rankings for all characteristics in each CA model with K = 5. The darker the color, the higher the variable importance. In contrast to Fig 5, the rank values are displayed in the order of importance. In our case, the importance of the top three variables is high. We plot Fig 6 considering that the values of the lower-importance variables are all close to zero when the importance of the variable is expressed as a relative value because the importance of the top three variables is high. The importance of firm characteristic variables in each model is similar. Particularly, the top five high-importance variables and the bottom five low-importance variables are similar in each model.
When the characteristic importance is calculated through the CA models, different results may be obtained depending on the process of model fitting. We fit several parameters because the autoencoder is a type of neural network, and the model may overfit or underfit depending on hyperparameters. Nevertheless, the characteristic importance in the autoencoder model is useful in assessing the latent factors. For instance, the CA model has important implications for identifying significant firm characteristic variables in asset pricing. By employing these, we can identify the factors that affect asset prices when the market changes rapidly, such as in a financial crisis. Therefore, we now examine the change in the importance of firm characteristics in the subsample through the CA models.
We set the global financial crisis as the period from July 2007 to June 2009 and the COVID-19 pandemic from January 2020 to December 2020. Fig 7 depicts the importance of the top 10 most influential variables during the financial crisis and the COVID-19 pandemic periods and demonstrates that market equity (mvel1) always has the highest importance. Furthermore, growth in long-term debt (lev) and gross profitability (gma) exhibit high importance during the financial crisis. This finding is in line with the existence of a large proportion of stocks with negative gross profit during the financial crisis (27.9%). However, in the case of the COVID-19 period, illiquidity (ill), the ratio of the current price to the 52-week high price (high52), idiosyncratic return volatility (idiovol), and R&D expense to market capitalization (rd_mve) display high importance. This reflects the growing importance of market friction factors and tech stocks during 2020.
The results in Fig 8 show that different variables are considered influential in the CA model for different times. That is, compared to the factors defined in advance in the traditional asset price model, the latent factor by the CA model indicates that the variable market environment can be reflected. This can be said to be the biggest characteristic of the CA model, and it can be inferred that the CA model has superior explanatory power compared to the FF model in this study. It is expected that more diverse analyzes and applications will be possible in the future by deriving important variables of the CA model.

Conclusion
We applied machine learning-based CA asset pricing models to the Korean stock market. The autoencoder-one of the popular machine learning methods-was employed to extract latent factors, following Gu et al. [10]. The autoencoder generalizes PCA by including nonlinearity and is known to effectively extract latent factors and obtain dynamically changing coefficients of latent factors. Thus, the CA model can reflect external market information in financial applications.
We examined the explanatory power of the CA model for the Korean market. Subsequently, we compared the CA model with the traditional asset pricing model. Our results demonstrated that the CA model dominates the traditional models (e.g., the FF models) in terms of OOS R 2 and stability under various settings including KOSDAQ, small stocks, penny stocks, illiquid stocks and irrational investors' stocks. This result shows that the CA model can provide generalized explanatory power in markets other than the US market, which is widely used in asset price model studies. Also, as a result of subsample analysis, the existing asset price model has a difference in explanatory power depending on the sample, whereas the CA model shows excellent explanatory power for several subsamples. This indicates that the CA model sufficiently supplements the limitations of the existing asset price model, which lacks explanatory power in a specific subsample. This indirectly shows the structural advantage of deep learning in which the importance of input data dynamically changes through a hidden layer when estimating the latent factor of the CA model. The CA model can explain several market anomalies that the FF models are unable to clarify. In other words, it shows that the latent factor has a common market risk factor that has not been considered in the existing asset price model. In addition, by using the pricing error of the asset pricing model, strategies based on overvalued stocks or undervalued stocks were compared. The comparison reveals that the performance of the CA model was excellent. This means that the CA model can accurately determine whether a stock is overvalued or undervalued compared to traditional asset pricing models. Thus, the CA model explains the expected returns of stocks well. Lastly, the CA model also revealed the firm characteristics that are important in asset pricing and how their importance varies with macro-financial states. This is the advantage of being able to identify variables that had a significant impact in the entire sample period or a specific period through the CA model. In addition, it shows that the importance Each importance within each model is normalized to sum to one. We set the number of latent factors (K) to five for comparison. CA: conditional autoencoder; acc: accruals; rd_sale: R&D to sales; dy: dividend to price ratio; mom1m: 1-month momentum; lev: growth in long-term debt; maxret: maximum daily return; SP: sales to price ratio; beta: market beta; mvel1: market equity; retvol: return volatility; depr: depreciation divided by PP&E; quick: quick ratio; cfp: cash flow to price ratio; convind: convertible debt indicator; currat: current ratio; ill: illiquidity; mom6m: 6-month momentum; egr: growth in common shareholder equity; high52: the ratio of the current price to the 52-week high price; betasq: market beta squared.
https://doi.org/10.1371/journal.pone.0281783.g006 of each variable is changed over time, which is a major feature of the CA model. This is a big difference from the model in which the factor is fixed and defined in advance like the FF model. Due to these characteristics, it can be inferred that the CA model shows superior explanatory power compared to the existing asset pricing model.
This study provides several research possibilities. First, trading strategies can be devised using the CA model. Second, more international studies beyond the Korean and U.S. markets are needed. Third, another machine learning-based dimension reduction technique can be The columns correspond to individual models according to the number of hidden layers. The firm characteristic variables are sorted based on the rank sum for the conditional autoencoder model with K = 5. The most important characteristics are at the top and the least influential at the bottom. Additionally, the darker the color, the greater the influence of that variable. mvel1: market equity; retvol: return volatility; SP: sales to price ratio; beta: market beta; mom12m: 12-month momentum; lev: growth in long-term debt; maxret: maximum daily return; idiovol: idiosyncratic return volatility; high52: the ratio of the current price to the 52-week high price; betasq: market beta squared; dy: dividend to price ratio; cash: cash holdings; ill: illiquidity; gma: gross profitability; mom1m: 1-month momentum; rd_sale: R&D to sales; absacc: absolute accruals; egr: growth in common shareholder equity; depr: depreciation divided by PP&E; chmom: change in 6-month momentum; sgr: sales growth; cfp: cash flow to price ratio; mom36m: 36-month momentum; mom6m: 6-month momentum; acc: accruals; currat: current ratio; convind: convertible debt indicator; pchgm_pchsale: change in gross margin minus change in sales; chcsho: change in shares outstanding; rd_mve: expense to market capitalization; ts: total skewness; agr: asset growth; hire: employee growth rate; lgr: growth in long-term debt; quick: quick ratio; pchcurrat: change in current ratio; pchdepr: change in depreciation; pchquick: change in quick ratio; CA: conditional autoencoder.
https://doi.org/10.1371/journal.pone.0281783.g007 compared with the CA model. Fourth, while we focus on an equity market, fixed income and other asset classes can also be analyzed. importance within each model is normalized to sum to one. We set the number of the latent factors (K) to five for comparison. mom1m: 1-month momentum; pchgm_pchsale: change in gross margin minus change in sales; high52: the ratio of the current price to the 52-week high price; dy: dividend to price ratio; absacc: absolute accruals; chmom: change in 6-month momentum; betasq: market beta squared; gma: gross profitability; lev: growth in long-term debt; mvel1: market equity; cash: cash holdings; ts: total skewness; chcsho: change in shares outstanding; mom12m: 12-month momentum; rd_mve: expense to market capitalization; idiovol: idiosyncratic return volatility; ill: illiquidity.